A Genetic EM Algorithm for Learning the Optimal Number of Components of Mixture Models

نویسندگان

  • Wei Lu
  • Issa Traore
چکیده

Mixture models have been widely used in cluster analysis. Traditional mixture densities-based clustering algorithms usually predefine the number of clusters via random selection or contend based knowledge. An improper pre-selection of the number of clusters may easily lead to bad clustering outcome. Expectation-maximization (EM) algorithm is a common approach to estimate the parameters of mixture models. However, EM is prone to converge into local maximum after a limited number of iterations. Moreover, EM usually assumes that the number of mixing components is known in advance, which is not always the case in practice. In order to address these issues we propose in this paper a new genetic EM algorithm to learn the optimal number of components of mixture models. Specifically, the algorithm defines an entropy-based fitness function, and two genetic operators for splitting and merging components. We conducted two sets of experiments using a synthetic dataset and two existing benchmarks to validate our genetic EM algorithm. The results obtained in the first experiment show that the algorithm can estimate exactly the optimal number of clusters for a set of data. In the second experiment, we computed three major clustering validity indices and compared the corresponding results with those obtained using established clustering techniques, and found that our genetic EM algorithm achieves better clustering structures. Key-Words: Clustering, Gaussian mixture model, EM algorithm, Genetic algorithm, Clustering validity

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised learning of regression mixture models with unknown number of components

Regression mixture models are widely studied in statistics, machine learning and data analysis. Fitting regression mixtures is challenging and is usually performed by maximum likelihood by using the expectation-maximization (EM) algorithm. However, it is well-known that the initialization is crucial for EM. If the initialization is inappropriately performed, the EM algorithm may lead to unsatis...

متن کامل

Solving Redundancy Allocation Problem with Repairable Components Using Genetic Algorithm and Simulation Method

Reliability optimization problem has a wide application in engineering area. One of the most important problems in reliability is redundancy allocation problem (RAP). In this research, we worked on a RAP with repairable components and k-out-of-n sub-systems structure. The objective function was to maximize system reliability under cost and weight constraints. The aim was determining optimal com...

متن کامل

Application of the Genetic Algorithm to Calculate the Interaction Parameters for Multiphase and Multicomponent Systems

A method based on the Genetic Algorithm (GA) was developed to study the phase behavior of multicomponent and multiphase systems. Upon application of the GA to the thermodynamic models which are commonly used to study the VLE, VLLE and LLE phase equilibria, the physically meaningful values for the Binary Interaction Parameters (BIP) of the models were obtained. Using the method proposed in t...

متن کامل

Sequential and Mixed Genetic Algorithm and Learning Automata (SGALA, MGALA) for Feature Selection in QSAR

Feature selection is of great importance in Quantitative Structure-Activity Relationship (QSAR) analysis. This problem has been solved using some meta-heuristic algorithms such as: GA, PSO, ACO, SA and so on. In this work two novel hybrid meta-heuristic algorithms i.e. Sequential GA and LA (SGALA) and Mixed GA and LA (MGALA), which are based on Genetic algorithm and learning automata for QSAR f...

متن کامل

Sequential and Mixed Genetic Algorithm and Learning Automata (SGALA, MGALA) for Feature Selection in QSAR

Feature selection is of great importance in Quantitative Structure-Activity Relationship (QSAR) analysis. This problem has been solved using some meta-heuristic algorithms such as: GA, PSO, ACO, SA and so on. In this work two novel hybrid meta-heuristic algorithms i.e. Sequential GA and LA (SGALA) and Mixed GA and LA (MGALA), which are based on Genetic algorithm and learning automata for QSAR f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006